> 



000 
001 
002 



003 Linearized Alternating Direction Method with 
Adaptive Penalty for Low-Rank Representation 



. 028 



030 

ON ■ 031 

o 



045 
046 
047 



004 
005 

006 

007 
008 

Anonymous Author(s) 
010 Affiliation 
^ ; Oil Address 

^ . 012 email 

Q-i' 013 

■ 015 

<N ; 016 

017 

(J , 018 



Abstract 



Low -rank representation (LRR) is an effective method for subspace clustering and 
has found wide applications in computer vision and machine learning. The exist- 
ing LRR solver is based on the alternating direction method (ADM). It suffers 



019 

020 from 0{n'^) computation complexity due to the matrix-matrix multiplications and 



q 

, 021 matrix inversions, even if partial SVD is used. Moreover, introducing auxiliary 

■ 022 variables also slows down the convergence. Such a heavy computation load pre- 

023 vents LRR from large scale applications. In this paper, we generalize ADM by lin- 

024 earizing the quadratic penalty term and allowing the penalty to change adaptively. 

025 We also propose a novel rule to update the penalty such that the convergence 

026 is fast. With our linearized ADM with adaptive penalty (LADMAP) method, it 



is unnecessary to introduce auxiliary variables and invert matrices. The matrix- 



matrix multiplications are further alleviated by using the skinny SVD representa- 
[ tion technique. As a result, we arrive at an algorithm for LRR with complexity 

■ 0{rn^), where r is the rank of the representation matrix. Numerical experiments 

verify that for LRR our LADMAP method is much faster than state-of-the-art al- 
gorithms. Although we only present the results on LRR, LADMAP actually can 
032 be applied to solving more general convex programs. 

033 
034 

035 1 Introduction 

! 036 

^ ' 037 Recently, compressive sensing fS] and sparse representation fTH) have been hot research topics and 

038 also have found abundant applications in signal processing and machine learning. Many of the 

039 problems in these fields can be formulated as the following convex programs: 

040 min/(x)+.g(y), s.t.^(x)+S(y) = c, (1) 

041 x.y 

042 where x, y and c could be either vectors or matrices, / and g are convex functions (e.g., the nuclear 

043 norm || • ||* |j2|, Frobenius norm || • ||, l2,i norm || • ||2,i HD, and li norm || • ||i), and A and B are 

044 linear mappings. 



As a measure of 2D sparsity, nuclear norm minimization (namely, /(X) ~ ||X||:,) has now attracted 
a lot of attention and has been successfully applied to video processing |10], matrix recovery lH, 
unsupervised learning |13| and semi-supervised learning |8|. Typical problems are matrix com- 

048 pletion |4|, robust principal component analysis |19| and their combination |3|. A nuclear norm 

049 minimization problem could be reformulated as a semidefinite program |4|, hence could be solved 

050 by any off-the-shelf interior point based toolbox, such as CVX. However, interior point methods 

051 cannot handle large scale matrices due to their 0{n^ ) complexity in each iteration, where n x n is 

052 the matrix size. To overcome this issue, several first-order algorithms have been developed to solve 

053 nuclear norm minimization problems. One method is the singular value thresholding (SVT) algo- 
rithm ||2l which applies the soft-thresholding operator to the singular values of a certain matrix in 



1 



each iteration. The accelerated proximal gradient (APG) algorithm ifTTI is also a popular choice due 
to its guaranteed 0{k^^) convergence rate, where k is the iteration number. The alternating direc- 
tion method (ADM) has also regained a lot of attention [TT TSl. It updates the variables alternately 

057 by minimizing the augmented Lagrangian function with respect to the variables in a Gauss-Seidel 

058 manner. 
059 

In 2010, Liu et al. 1 13 1 proposed the low-rank representation (LRR) for robust subspace clustering. 
Unlike the sparse representation |6|, which minimizes the number of nonzero entries in the repre- 
sentation matrix, LRR seeks to minimize the rank of the representation matrix. The mathematical 
model of LRR is as follow^]: 

min||Z||,+^||E||2i, s.t. X^XZ + E, (2) 

064 Z,E ' 

065 where X is the data matrix. LRR has found wide applications in computer vision and machine 

066 learning, e.g., motion segmentation, face clustering, and temporal segmentation ifTJl [TSl 171 . 

The existing LRR solver |fT3l is based on ADM. It suffers from O(n^) computation complexity due 
to the matrix-matrix multiplications and matrix inversions. Moreover, introducing auxiliary vari- 
ables also slows down the convergence, as there are more variables and constraints. Such a heavy 

070 computation load prevents LRR from large scale applications. In this paper, we generalize ADM 

071 by linearizing the quadratic penalty term and allowing the penalty to change adaptively. Lineariza- 

072 tion makes the auxiliary variables unnecessary, hence waiving the matrix inversions, while variable 

073 penalty makes the convergence fast. We also propose a novel and simple rule to update the penalty. 

074 We prove the global convergence of linearized ADM with adaptive penalty (LADMAP) and apply it 
to LRR, obtaining faster convergence speed than the original solver By further representing Z as its 
skinny SVD and utilizing an advanced functionality of the PROPACK 1 1 1] package, the complexity 
of solving LRR by LADMAP becomes only 0{rn^), as there is no full sized matrix-matrix multi- 
plications, where r is the rank of the optimal Z. Although we only present the numerical results on 
LRR, LADMAP actually can be applied to solving more general convex programs. 



083 
084 
085 



077 
078 
079 

080 Our work is inspked by Yang et al. | i2T| . Nonetheless, the difference of our work from theirs is 
distinct. First, they only proved the convergence of linearized ADM (LADM) for a specific prob- 
(jg2 lem, namely nuclear norm regularization. Their proof utilized some special properties of the nuclear 
norm, while we prove the convergence of LADM for general problems in ([T]i. Second, they only 
proved in the case of fixed penalty, while we prove in the case of variable penalty. Although they 
mentioned the dynamic updating rule proposed in [9J, their proof cannot be straightforwardly ap- 
plied to the case of variable penalty. Moreover, that rule is for ADM only. Third, the convergence 

086 speed of LADM heavily depends on the choice of penalty. So it is difficult to choose an optimal 

087 fixed penalty that fits for different problems and problem sizes, while our novel updating rule for the 

088 penalty, although simple, is effective for different problems and problem sizes. 
089 
090 
091 
092 
093 

094 ADM is now very popular in solving large scale sparse representation problems fT^. When solving 

095 (Uli by ADM, one operates on the following augmented Lagrangian function: 

096 B 

097 /:A(x,y,A) = /(x)+,g(y) + (A,^(x)+S(y)-y) + |p(x)+6(y)-c|p, (3) 

098 where A is the Lagrange multiplier, (•, •) is the inner product, and /3 > is the penalty parameter 

099 The usual augmented Lagrange multiplier method is to minimize w.r.t. x and y simultaneously. 

100 This is usually difficult and does not exploit the fact that the objective function is separable. To 

101 remedy this issue, ADM decomposes the minimization of £a w.rt. (x, y) into two subproblems 

102 that minimize w.rt. x and y, respectively. More specifically, the iterations of ADM go as follows: 

103 Xfe+i = argmin£A(x,yfc, Afc), (4) 
104 
105 



2 Linearized Alternating Direction Method with Adaptive Penalty 
2.1 The Alternating Direction Method 



yfc+i = argmin£^(xfc+i,y, Afc), (5) 



Afe+i = Afc+/3[^(xfc+i)+S(yfc+i)-c]. (6) 

107 

'Here we switch to bold capital letters in order to emphasize that the variables are matrices. 
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In compressive sensing and sparse representation, as / and g are usually matrix or vector norms, the 
subproblems (|4| and (|5]l usually have closed form solutions when A and B are identities li2l [T3ll22l . 
^"10 In this case, ADM is appealing. However, in many problems A and B are not identities. For 

111 example, in matrix completion A can be a selection matrix, and in ID sparse representation A can 

112 be a general matrix. In this case, there are no closed form solutions to (HI and (|5]l. To overcome 

113 this difficulty, a common strategy is to introduce auxiliary variables Iil3n l,l u and v and reformulate 

114 problem ([Til into an equivalent one: 

115 mill /(x)+5(y), s.t. yl(u)+S(v) = c,x = u,y = v, (7) 

116 x,y,u,v 

117 and the corresponding ADM iterations analogous to (IHi-© can be deduced. With more variables 

118 and more constraints, the convergence of ADM becomes slower. Moreover, to update u and v, 

119 whose subproblems are least squares problems, matrix inversions are often necessary. 

120 

121 2.2 Linearized ADM 
122 
123 
124 
125 

126 With minor algebra, one can see that subproblem (|4]i is equivalent to 
127 
128 
129 
130 
131 

132 Xfc+i = argmin/(x) + {A* (Xk) + l3A*{A{^k) + B{yk) - c),x-Xfc) + ^||x-Xfe||2 

= argmin/(x) + ^||x - x^ + ^*(A, + P{A{^k) + B{yu) - c))/{Pr^A)\\\ 

135 ^ (9) 

136 where A* is the adjoint of A and 77^1 > is a parameter whose proper value will be analyzed later 
.| The above approximation resembles that of APG 1 17J , but we do not use APG to solve © iteratively. 

1 38 Similarly, subproblem (|5]l can be approximated by 

139 o 

140 Yk+i = argmin5(y) + ^||y - + 6* (Afe + p{A{^k+i) + B{yk) - c))/{P^B)f. (10) 

y 2 

141 

142 The update of Lagrange multiplier still goes as (|6]Q. 
143 

144 2.3 Adaptive Penalty 



145 
146 
147 



150 
151 



To avoid introducing auxiliary variables and still solve subproblems (|4]i and (|5]l efficiently, inspired 
by Yang et al. 121], we propose a linearization technique for ^ and Q. To further accelerate the 
convergence of the algorithm, we also propose an adaptive rule for updating the penalty parameter. 



xfe+i =argmin/(x) + ^P(x)+6(yfc)-c + Afc//3||2. (8) 

By linearizing the quadratic term in ([8]l at x^. and adding a proximal term, we have the following 
approximation: 



In previous ADM and LADM approaches lfT6ll22ll2Tl . the penalty parameter /3 is fixed. Some schol- 
ars have observed that ADM with a fixed /3 can converge very slowly and it is nontrivial to choose an 
optimal fixed /3. So is LADM. Thus a dynamic /3 is preferred in real applications. Although Tao et 
al. ||T6l and Yang et al. II2TI mentioned He et al.'s adaptive updating rule (|9l in their papers, the rule 
1''^ is for ADM only. We propose the following adaptive updating strategy for the penalty parameter j3\ 

/3fc+l = min(^niax,P/3fc); (11) 

152 where /3max is an upper bound of {Pk}- The value of p is defined as 

n-f if^fcmax(7^||xfc+i -Xfc||,7^||yfc+i -yfc||)/||c|| < £2, f-.^. 

^^"^ ^ ~ 1 1, otherwise, ^^^> 

155 

156 where po > 1 is a constant. The condition to assign p = po comes from the analysis on the stopping 
.(57 criteria (see Section |23] |. We recommend that Pq — ae2, where a depends on the size of c. Our 
.|gg updating rule is fundamentally different from He et al.'s for ADM 19J, which aims at balancing the 
errors in the stopping criteria and involves several parameters. 



160 2^sin|21|, we can also introduce a parameter 7 and update A as Afc+i — Afc+7/3[^(xfc+i) + B(yfc+i) — c]. 

161 We choose not to do so in this paper in order not to make the exposition of LADMAP too complex. The 
reviewers can refer to Supplementary Material for full details. 
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2.4 Convergence of LADMAP 



-^fc77A(xfc+i-xt)-^*(Afe+i) G a/(xfc+i), -^fc77i3(yfc+i-yfc)-S*(Afe+i) e ag(yfe+i), (13) 



This can be easily proved by checking the optimality conditions of ^ and ( fTOl l. 



162 
163 

164 To prove the convergence of LADMAP, we first have the following propositions. 
165 

166 Proposition! 
167 
168 

169 where Xk+i = Xk + l3k[A{-x.k) + B{yk) - c], Xk+i = Xk + l3k[A{xk+i) + B{yk) - c], anddf and 

170 dg are subgradients of f and g, respectively. 
171 
172 
1 73 

Proposition 2 Denote the operator norms of A and B as \\A\\ and \\B\\, respectively. If {Pk} is 
^"^^ non-decreasing and upper bounded, rj A > ?7_b > \\B\\^, and {x* ,y* , X*) is any Karush-Kuhn- 

175 Tucker (KKT) point of problem (0 (see (I23-(123A then: (1). {?7A||xfe - x*||2 - \\A{xk - x*)||2 + 

77B|lxfe - x*||2 + l3^^\\Xk - A*||2} is non-increasing. (2). ||xfc+i - Xfc|| ^ 0, ||yfc+i - yfc|| ^ 0, 
177 ||Afc+i-Afc|| ^0. 
178 

179 The proof can be found in Supplementary Material. Then we can prove the convergence of 

1 80 LADMAP, as stated in the following theorem. 

181 
182 
183 

1 2^ The proof can be found in Appendix lAl 
185 
186 
187 

188 The KKT conditions of problem ([T]i are that there exists a triple (x*, y*, A*) such that 



Tlieorem 3 IfiPk} is non-decreasing and upper bounded, rjA > WAW^, and i]b > WBW^, then the 
sequence { (x^ , y^ , Afc ) } generated by LADMAP converges to a KKT point of problem ([7}. 



2.5 Stopping Criteria 



189 
190 
191 

1 92 The triple (x* , y * , A* ) is called a KKT point. So the first stopping criterion is the feasibiUty : 

193 

194 M(xfe+i)+S(yfc+i)-c||/||c|| <ei. (16) 

195 
196 

197 -PkivBiVk+i - yk)+B*{A{xk+i - xfe))] - B*CXk+i) e dg{yk+i). (17) 

198 

Sofor Afe+1 to satisfy the second KKT condition, both /JfcTyAllxfc+i-Xfcll and ^fe||7;_B(yA;+i - yt) + 



^(x*)+6(y*)-c = 0, (14) 
-^*(A*) e a/(x*), -6*(A*) e dg{y*). (15) 



As for the second KKT condition, we rewrite the second part of Proposition[T]as follows 



B* {A{'x.k+i — Xfc))|| should be small enough. This leads to the second stopping criterion: 



200 

201 /3femax(ry^||xfc+i - ^k\\/\\A* {c)\\,r^B\\yk+i - yfc||/||6*(c)||) < e'^. (18) 

By estimating ||y^*(c)|| and ||;B*(c)|| by y^^||c|| and y^77B||c||, respectively, we arrive at the second 
stopping criterion in use: 



203 
204 

205 ^fcmax(^/^|lxfc+i -Xfc||,0?^|lyfe+i -yfc||)/||c|| < £2. (19) 

206 

207 Finally, we summarize our LADMAP algorithm in Algorithm[Tl 
208 

209 3 Applying LADMAP to LRR 

210 

211 In this section, we apply LADMAP to solving the LRR problem (|2|l. We also compare LADMAP 

21 2 with other state-of-the-art algorithms for LRR. The reason we choose LRR as an example of applica- 

213 tions of LADMAP is twofold. First, LRR has become an important mathematical model in machine 

214 learning. Second, unlike other established nuclear norm minimization problems, such as matrix 

215 completion |4| and robust principal component analysis [3J, if not carefully treated, the complexity 
of solving LRR is still 0{n^), even if partial SVD is used. 
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Algorithm 1 LADMAP for Problem O 



216 
217 

Initialize: Set ei > 0, £2 > 0, /3max > /3o > 0, ry^ > r/s > , xq, yo, Aq, and /c ^ 0. 

while ( fT6] i or ( fT9] i is not satisfied do 
Step 1: Update x by solving Q. 

221 Step 2: Update y by solving ( flOl l. 

222 Step 3: Update A by ©. 

223 Step 4: Update /3 by (HUi and ([Hli. 

224 Step 5: fc fc + 1. 

225 end while 
226 

227 

228 3.1 Solving LRR by LADMAP 
229 

230 As the LRR problem (|2]i is a special case of problem ([U, LADMAP can be directly applied to it. 

231 The two subproblems both have closed form solutions. In the subproblem for updating E, one may 

232 apply the /2,1-norm shrinkage operator 1 13 1, with a threshold fi^^, to matrix Mj. = — XZj. + X — 

233 Ak/Pk- In the subproblem for updating Z, one has to apply the singular value shrinkage operator [21, 

234 with a threshold (/3fc?7x)"\ to matrix = - 77^^X^(XZfc + E^+i - X + Ak/Pk), where 

235 Vx > o'max(^)- -ff Nfc is formed explicitly, the usual technique of partial SVD, using PROPACK 

236 ifTTl . with rank prediction llT2ll2n ri7l can be utilized to compute the leading r singular values and 
associated vectors of N^. efficiently, making the complexity of SVD computation 0{rn^), where r 
is the predicted rank of Zk+i and n is the column number of X. Note that as /3k is non-decreasing, 
the predicted rank is almost non-decreasing, making the iterations computationally efficient. 

239 

240 Up to now, LADMAP for LRR is still of complexity O(n^), although partial SVD is already used. 

241 This is because forming and requires full sized matrix-matrix multiplications, e.g., XZ^. 

242 To break this complexity bound, we introduce another technique to further accelerate LADMAP for 

243 LRR. By representing Zj. as its skinny SVD: Zj. — UfeS^V^, some of the full sized matrix-matrix 

244 multiplications are gone: they are replaced by successive reduced sized matrix-matrix multiplica- 

245 tions. For example, when updating E, XZ^ is computed as ((XUfc)Sfc)V^, reducing the com- 
plexity to 0{rn^). When computing the partial SVD of N^, things are more complicated. If we 
form Nfc explicitly, we will face with computing X^(X + Ak/ (ik), which is neither low-rank nor 
spars^ Fortunately, in PROPACK the bi-diagonalizing process of is done by the Lanczos pro- 

248 cedure 1 1 1], which only requires to compute matrix-vector multiplications N^v and u-^N^, where 

249 u and v are some vectors in the Lanczos procedure. So we may compute N^v and u-'^Nfc by mul- 

250 tiplying the vectors u and v successively with the component matrices in N^, rather than forming 

251 Nfc explicitly. So the computation complexity of partial SVD of is still 0{rn?). Consequently, 

252 with our acceleration techniques, the complexity of our accelerated LADMAP for LRR is 0{rii?). 

253 The accelerated LADMAP is summarized in Algorithmic] 

254 

255 3.2 Comparison with Other Methods 
256 

257 As shown in [T3l, the LRR problem can be solved by the classic ADM. However, their algorithm 

258 requires an auxiliary variable, matrix- matrix multiplication and inversion of matrices, resulting in 

259 O(n^) computation complexity and slow convergence. 
260 

The LRR problem can also be solved approximately by being reformulated to the following un- 

constrained optimization problem: min/3(||Z||, + /i||E||2,i) + i||X - XR - E|p, where /? > 

262 Z,E 

263 is a relaxation parameter. Then APG with the continuation technique |fT7| , which is to reduce /3 

264 gradually by Pk+i = inax(/3niin, 6'/?^), can be applied to solve this problenfl Compared with APG, 
2gg which can only find an approximate solution, LADMAP can produce a much more accurate solution 

as it is proven to converge to an exact solution. 

266 

267 

268 'when forming Nfc explicitly, X"^XZfc can be computed as ((X"^(XUfc))Sfc)V^, whose complexity is 

269 still O(rn^), while X"^Efc+i could also be accelerated as Efc+i is a column-sparse matrix. 

""Please see Supplementary Material for the detail of solving LRR by APG. 
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271 
272 



Algorithm 2 Accelerated LADMAP for LRR © 



276 
277 
278 
279 
280 



Input: Observation matrix X and parameter /i > 0. 

Initialize: Set Eo, Zo and Aq to zero matrices, where Zq is represented as (Uq, Sq, Vq) -s— 
(0, 0, 0). Set £i > 0, £2 > 0, /3max > ^0 > 0, r?x > o-max(X), and fc 0. 
while ( fT6] l or ( fT9] l is not satisfied do 
275 Step 1: Update Efe+i = argmin^||E|l2,i + ^||E + (XUfc)SfcV^ - X + Ak/hf. This 

subproblem can be solved by using Lemma 3.2 in lilSJ . 

Step 2: Update the skinny S VD (Uj,+i , Sfc+i , V^+i ) of 7ik+i - First, compute the partial SVD 
U,.SrV^ of the implicit matrix N^:, which is bi-diagonalized by the successive matrix-vector 
multiplication technique described in Section lTTl and the rank r is predicted as in i 121 1211 [TtII . 
Second, U,+i = U,(:, 1 : r'), ^k+i = : r',! : r') - {hvx)-^^, ^k+i = V,(:, 1 : r'), 

where r' is the number of singular values in that are greater than {Pk'Hx) ^- 

282 Step 3: Update A^+i = A^ + /3fc((XUfc+i)Sfc+iV^+i + E^+i - X). 

283 Step 4: Update fik+i by (HB-dlll- 

284 Step 5: fc ^ fc + 1. 

285 end while 
286 

287 

288 The Unearization technique has also been applied to other optimization methods. For example, 

289 Yin ||23l applied this technique to the Bregman iteration for solving compressive sensing problems 

290 and proved that the linearized Bregman method converges to an exact solution conditionally . In 
2g-| comparison, LADMAP always converges to an exact solution. 

292 
293 
294 
295 

In this section, we report numerical results on the standard LADMAP, the accelerated LADMAP and 
other state-of-the-art algorithms, including APG, ADIvfl and LADM, for LRR based data clustering 

297 problems. APG, ADM, LADM and LADMAP all utiUze the Matlab version of PROPACK 1 1 1 1. 

298 Por the accelerated LADMAP, we provide two function handles to PROPACK which fulfils the 

299 successive matrix-vector multiplications. All experiments are run and timed on a PC with an Intel 

300 Core i5 CPU at 2.67GHz and with 4GB of memory, running Windows 7 and Matlab version 7.10. 
301 

We test and compare these solvers on both synthetic multiple subspaces data and the real world 
■'''2 motion data (Hopkinl55 motion segmentation database |18]). For APG, we set the parameters 

303 ^ 0.01, /3„iin = 10"^°, = 0.9 and the Lipschitz constant t = ct^j^^(X). The parameters 

304 of ADM and LADM are the same as that in ITS) and 1211, respectively. In particular, for LADM 

305 the penalty is fixed ai P — 2.5/ min(m, n), where m x n is the size of X. For LADMAP, we set 

306 £i = 10-4^ £2 = 10-^ /3o = min(m,n)£2, /3„,ax = 10^", po = 1-9, and r^x - 1.02a2^,JX). 

307 As the code of ADM was downloaded, its stopping criteria, ||XZfc + E^ — X||/||X|| < £i and 

308 max(||Efc - Efc_i||/||X||, ||Zfc - Zfc_i||/||X||) < £2, are used in all our experiment^ 

309 

310 4.1 On Synthetic Data 
311 

312 The synthetic test data, parameterized as (s, p, d, f), is created by the same procedure in |[T3ll . s 
independent subspaces {Si}f^i are constructed, whose bases are generated by U^+i = 

TU,;, 1 < i < s — 1, where T is a random rotation and Ui is a d x f random orthogonal matrix. 
So each subspace has a rank of r and the data has an ambient dimension of d. Then p data points 
are sampled from each subspace by X^ = UiQi, I < i < s, with being an f x p i.i.d. zero 
mean unit variance Gaussian matrix Af{0, 1). 20% samples are randomly chosen to be corrupted by 
adding Gaussian noise with zero mean and standard deviation 0.1||x||. We empirically find that LRR 
3^8 achieves the best clustering performance on this data set when fi — 0.1. So we test all algorithms 

319 with /i = 0.1 in this experiment. To measure the relative errors in the solutions, we run the standard 

320 LADMAP 2000 iterations with ^^ax = 10^ to establish the ground truth solution (Eq, Zq). 

321 

322 ^We use the Matlab code provided online by the authors of 1131 . 

323 ^Note that the second criterion differs from that in il9\ . However, this does not harm the convergence of 
LADMAP because il9\ is always checked when updating Pk+i (see il2\ ). 



315 
316 
317 



4 Experimental Results 
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The computational comparison is summarized in Table [T] We can see that the iteration numbers 
and the CPU times of both the standard and accelerated LADMAP are much less than those of 
other methods, and the accelerated LADMAP is further much faster than the standard LADMAP. 
Moreover, the advantage of the accelerated LADMAP is even greater when the ratio f j-p, which is 
roughly the ratio of the rank of Zo to the size of Zq, is smaller, which testifies to the complexity 
estimations on the standard and accelerated LADMAP for LRR. It is noteworthy that the iteration 
numbers of ADM and LADM seem to grow with the problem sizes, while that of LADMAP is rather 
constant. Moreover, LADM is not faster than ADM. In particular, on the last data we were unable to 
wait until LADM stopped. Finally, as APG converges to an approximate solution to (|2]l, its relative 
errors are larger and its clustering accuracy is lower than ADM and LADM based methods. 

Table 1: Compai'ison among APG, ADM, LADM, standard LADMAP and accelerated LADMAP 
(denoted as LADMAP(A)) on the synthetic data. For each quadruple (s, p, d, f), the LRR problem, 
with /i — 0.1, was solved for the same data using different algorithms. We present typical run- 
ning time (in x 10"^ seconds), iteration number, relative error (%) of output solution (E, Z) and the 
clustering accuracy (%) of tested algorithms, respectively. 



Size (s, p, d, f) 


Method 


Time 


Iter 


l|Z-Zoll 


1 E — Eol 


Acc. 










IIZoll 


IIEoll 






APG 


0.0332 


110 


2.2079 


1.5096 


81.5 




ADM 


0.0529 


176 


0.5491 


0.5093 


90.0 


(10, 20,200,5) 


LADM 


0.0603 


194 


0.5480 


0.5024 


90.0 




LADMAP 


0.0145 


46 


0.5480 


0.5024 


90.0 




LADMAP(A) 


0.0010 


46 


0.5480 


0.5024 


90.0 




APG 


0.0869 


106 


2.4824 


1.0341 


80.0 




ADM 


0.1526 


185 


0.6519 


0.4078 


83.7 


(15,20,300,5) 


LADM 


0.2943 


363 


0.6518 


0.4076 


86.7 




LADMAP 


0.0336 


41 


0.6518 


0.4076 


86.7 




LADMAP(A) 


0.0015 


41 


0.6518 


0.4076 


86.7 




APG 


1.8837 


117 


2.8905 


2.4017 


72.4 




ADM 


3.7139 


225 


1.1191 


1.0170 


80.0 


(20, 25, 500, 5) 


LADM 


8.1574 


508 


0.6379 


0.4268 


80.0 




LADMAP 


0.7762 


40 


0.6379 


0.4268 


84.6 




LADMAP(A) 


0.0053 


40 


0.6379 


0.4268 


84.6 




APG 


6.1252 


116 


3.0667 


0.9199 


69.4 




ADM 


11.7185 


220 


0.6865 


0.4866 


76.0 


(30, 30, 900, 5) 


LADM 


N.A. 


N.A. 


N.A. 


N.A. 


N.A. 




LADMAP 


2.3891 


44 


0.6864 


0.4294 


80.1 




LADMAP(A) 


0.0058 


44 


0.6864 


0.4294 


80.1 



Table 2: Comparison among APG, ADM, LADM, standard LADMAP and accelerated LADMAP 
on the Hopkins 155 database. We present their average computing time (in seconds), average number 
of iterations and average classification errors (%) on all 156 sequences. 





Two Motion 


Three Motion 


All 




Time 


Iter 


CErr. 


Time 


Iter 


CErr 


Time 


Iter 


CErr. 


APG 


15.7836 


90 


5.77 


46.4970 


90 


16.52 


22.6277 


90 


8.36 


ADM 


53.3470 


281 


5.72 


159.8644 


284 


16.52 


77.0864 


282 


8.33 


LADM 


9.6701 


110 


5.77 


22.1467 


64 


16.52 


12.4520 


99 


8.36 


LADMAP 


3.6964 


22 


5.72 


10.9438 


22 


16.52 


5.3114 


22 


8.33 


LADMAP(A) 


2.1348 


22 


5.72 


6.1098 


22 


16.52 


3.0202 


22 


8.33 



4.2 On Real World Data 

We further test the performance of these algorithms on the Hopkins 155 database IfTSl . This database 
consists of 156 sequences, each of which has 39 to 550 data vectors drawn from two or three motions. 
For computational efficiency, we preprocess the data by projecting it to be 5-dimensional using PCA. 
As /i 2.4 is the best parameter for this database |fT3l , we test all algorithms with /i = 2.4. 
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Table |2l shows the comparison among APG, ADM, LADM, standard LADMAP and accelerated 
■'^^ LADMAP on this database. We can also see that the standard and accelerated LADMAP are much 

380 faster than APG, ADM, and LADM, and the accelerated LADMAP is also faster than the standard 

381 LADMAP. However, in this experiment the advantage of the accelerated LADMAP over the standard 

382 LADMAP is not as dramatic as that in Table[T] This is because on this data /i is chosen as 2.4, which 

383 cannot make the rank of the ground truth solution Zq much smaller than the size of Zq. 
384 

5 Conclusions 

386 
387 

In this paper, we propose a linearized alternating direction method with adaptive penalty (LADMAP) 
apply it to solving the LRR problem. With linearization, auxiliary variables need not be intro- 

389 duced for closed-form solutions, when the objective functions are matrix or vector norms. Moreover, 

390 with fewer variables and constraints, the convergence also becomes faster Allowing the penalty to 

391 change adaptively further accelerates the convergence of LADM. When applying LADMAP to LRR, 

392 by representing the representation matrix as its skinny S VD, full sized matrix-matrix multiplications 

393 are avoided by using successive reduced sized matrix-matrix multiplications instead, and successive 

394 matrix-vector multiplications are introduced to compute the partial SVD. Finally, we are able to 
ggg solve LRR at a computation complexity of 0{rn^), which is highly advantageous over the exist- 
ing LRR solvers. Numerical results demonstrate that LADMAP converges faster than LADM and 
ADM and our acceleration techniques are effective on LRR. Although we only present results on 
LRR, LADMAP is actually a general method that can be applied to other convex programs. We will 
test it with more problems in sparse representation in the future. 



A Proof of Theorem |3] 



396 
397 
398 
399 
400 
401 
402 

4Q3 Proof By Proposition |2] (1), {(x^, y^, A^)} is bounded, hence has an accumulation point, say 

404 i^kj , Yk, , Afcj. ) (x°°, y°°, A°°). We accomplish the proof in two steps. 

405 1. We first prove that (x°° , y°° , A°° ) is a KKT point of problem dTJ. 
406 
407 
408 

409 By letting fc = fcj — 1 in Proposition[T]and the definition of subgradient, we have 
410 
411 
412 



By Proposition |2] (2), ^(x^+i) + B{yk+i) — c — (3^, ^(Afe+i — Afc) — > 0. This shows that any 
accumulation point of {(xfe, yfe)} is a feasible solution. 



/(xfej+.9(yfej < f{^*)+g{y*) + {^k,-^\-Pk,-ilA{^k,-^k,-i)-A*{Xk,)) 

+{yk, - y*, -l3k,-iVB{yk, - yk,-i) - B*{Xk,)). 

413 Let j +00, by observing Proposition|2](2), we have 

"^^"^ /(x°°)+5(y°°) < /(x*)+.g(y*) + (x--x*,-^*(A-)> + (y- -y*,-6*(A-)) 

= /(x*) + .9(y*) - (^(x- - x*), A°°) - (S(y- - y*), A^)) 
= /(x*) + .g(y*) - (^(x-) + S(y°°) - ^(x*) - B{y*),X°°) 
= /(x*)+5(y*), 



/(x) > /(xfcj + (x-Xfc^.,-/3fc^._irM(xfc^. -xfe^_i) -^*(AfcJ>, Vx. (20) 



415 
416 
417 
418 

4.|g where we have used the fact that both (x°°, y°°) and (x*, y*) are feasible solutions. So we conclude 
that (x°°, y°°) is an optimal solution to 

421 Again, let k — kj — 1 in Proposition[T]and by the definition of subgradient, we have 
422 
423 

424 Fix X and let j — > +oo, we see that 

426 /(x) > /(X-) + (x-x-,-^*(A°°)), Vx. 

427 So -A*i\°°) e 5/(x°°). Similarly, -B*{\°°) e dg{y°°). Therefore, (x°", y°°, A°°) is a KKT 

428 point of problem ([T]i. 

^29 2. We next prove that the whole sequence {(x^, y^, Afe)} converges to (x°°, y°°, A°°). 
430 

43.1 By choosing (x*,y*, A*) = (x°°, y°°, A°°) in Proposition]!] we have Ty^Uxfc^. - x°°|p - ||^(xfe^ - 
x°°)||2 + 77B||yfc^. - y°°||2 + l3,:^\\Xk^ - A°°||2 0. By Proposition |2| (1), we readily have 
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"^^^ 7M||xfc-X°°||2-P(xfc-X°°)||2 + ,;B||yfe-y-||2 + /3-2j|A,-A-||2^0. So (xfc , yfc , Afc ) ^ 

433 (^OO^yOO^J^OO)^ 

434 

435 y°°: '^°°) can be an arbitrary accumulation point of {(xfc, yfc, \k)}, we may conclude that 

{ (xfc , Yfc , Afc ) } converges to a KKT point of problem ([T]i. 

437 
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